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Abstract 



Metaquerying is a datamining technology by which hidden dependencies among several database relations 
can be discovered. This tool has already been successfully applied to several real-world applications. Recent 
papers provide only preliminary results about the complexity of metaquerying. In this paper we define 
several variants of metaquerying that encompass, as far as we know, all variants defined in the literature. 
We study both the combined complexity and the data complexity of these variants. We show that, under the 
combined complexity measure, metaquerying is generally intractable (unless P—NP), lying sometimes quite 
high in the complexity hierarchies (as high as NP^^), depending on the characteristics of the plausibility 
index. However, we are able to single out some tractable and interesting metaquerying cases (whose combined 
complexity is LOGCFL-complete). As for the data complexity of metaquerying, we prove that, in general, 
this is in TC° , but lies within AC° in some simpler cases. Finally, we discuss implementation of metaqueries, 
by providing algorithms to answer them. 

1 Introduction 

Companies and organizations often posses information resources and databases containing a vast amount of 
data waiting to prove useful one day. The expanding datamining research area is supposed to provide tools for 

*A partial and preliminary version of this paper appeared in Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART 

Symposium on Principles of Database Systems, May 15-17, 2000, Dallas, Texas, USA. 
t Contact Author. 
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the discovery of valuable knowledge in these huge data resources . 

Metaquerying p2[ is a promising approach for datamining in relational and deductive databases. Metaqueries 
serve as a generic description of a class of patterns a user is willing to discover. Unlike many other mining tools, 
patterns discovered using metaqueries can link information from several tables in databases. These patterns are 
all relational, while most machine-learning systems can only learn prepositional patterns and work on a single 
relation. Metaqueries can be specified by human experts or alternatively, they can be automatically generated 
from the database schema. 

Intuitively, a metaquery has the form 

T <— Li, Ljn (1) 

where T and Li are literal schemes Q{Yi, ...,Yn) and Q is either an ordinary predicate name or a predicate 
variable. In this latter case, Q{Yi, ...,Yn) can be instantiated to an atom with predicate symbol denoting a 
relation in the database. An answer to a metaquery is an (ordinary) rule obtained by consistently substituting 
second order predicates with relation names. 

Shen etal. j2l|| are, to best of our knowledge, the first who have presented a framework that uses metaqueries 
to integrate inductive learning methods with deductive database technology. 

As an example (taken from H^), let P,Q, and R be predicate variables and DB be a database, then the 
metaquery 

R{X,Z)^P{X,Y),QiY,Z). 
specifies that the patterns to be discovered are relationships of the form 

r(X, Z)^p{X,Y),q{Y,Z). 

where p, q, and r are relations from DB. For instance, for an appropriate database DB, one possible result of 
this metaquery could be a rule: 



speaks{X, Z) <^ citizen{X,Y), language{Y, Z). (2) 

A rule which serves as an answer to a metaquery is usually accompanied by indices, that indicate its "plausibility 
degree". In for example, each rule in the answer is supplied with support and confidence. The support 
indicates how frequently the body of the rule is satisfied, and the confidence measures what fraction of the 
tuples that satisfy the body, also satisfy the head. A confidence of 0.93, for instance, for the rule (||) means 
that out of all pairs {X, Z) that satisfy the body of the rule, 93% also satisfy the head. Admissibility thresholds 
for support and confidence is usually provided by the user. Support and confidence have been also defined for 
other datamining techniques, such as association rules 

Similar to the case of association rules, the plausibility indices are used with two major purposes: 
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1. to avoid presenting negligible information to the user, 

2. to cut off the search space by early detection of plausibility values. 
Formal definition of mctaqucrics and indices are given in the next section. 



Metaquerying was implemented in some datamining systems |12| , 11, and was arguably very useful in 
knowledge discovery |2^, However, despite the practical success of metaqueries, their theoretical 

foundations appear to be quite unclear. A careful reading of research manuscripts on the subject reveals that the 
semantics is not always well defined. In addition, except a very preliminary AT'-hardness result a thorough 
analysis of the involved complexities was never performed. 

This paper reports a study of some of the theoretical aspects about metaquerying. First, in Section ^, we 
provide formal definitions for the syntax and semantics of metaqueries. We define several types of metaqueries 
and indices that encompass, as far as we know, all variants defined in the literature. 

In Section ^ we study both the combined complexity and the data complexity of all the metaqueries variants 
that we define. We show that, under the combined complexity measure, metaquerying is intractable in general 
(unless P—NP), but we are able to single out a tractable interesting metaquerying case (whose combined 
complexity is LOGCi^L-complete). The tractable class is a subset of metaquerying problems which we call 
acyclic metaqueries. Acyclicity of metaqueries is a generalization of the analogous concept defined for conjunctive 
queries 0. 

As for the data complexity of metaquerying, we prove that, in general, this is in TU*^, but lies within AC'^ in 
some interesting cases. 

In Section ^ we discuss algorithms to implement metaquerying in a reasonably efhcient manner. 
Finally, in section ^ we draw some conclusions and report a summary table containing all the complexity 
results we prove throughout the paper. 



2 Metaquerying 

2.1 Syntax and semantics 

Let U he a countable domain of constants. A database DB is {D, Ri, i?„) where D C U is finite, and each 
Ri is a relation of arity a{Ri) such that Ri C D°-'^^^\ 

As stated above, a metaquery MQ is a second-order template describing a pattern to be discovered p^ . 
Such a template has the form 

T ^ Li, Ljn (3) 

where T and Li are literal schemes. Each literal scheme T or Li has the form Q{Yi, where Q is either 

a predicate (second order) variable or a relation symbol, and each Yj (1 < j < n) is an ordinary (first order) 
variable. If Q is a predicate variable, then (5(Yi, is called a relation pattern of arity n, otherwise it is 
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called an atom of arity n. The right-hand-side Li, is called the body of the metaquery, while T is called 
the head of the metaquery. A metaquery is called pure if each two relation patterns with the same predicate 
variable has the same arity. 

Intuitively, given a database instance DB, answering a metaquery MQ on DB amounts to finding all 
substitutions a of relation patterns appearing in MQ by atoms having as predicate names relations in DB, 
such that the Horn rule cr(MQ) (obtained by applying cr to MQ) encodes a dependency between the atoms in 
its head and body, which rule holds in DB with a certain degree of plausibility. The plausibility is defined in 
terms of indices which we will formally define shortly. 

Let MQ be a metaquery and DB a database. Let pw(MQ), ^s(MQ), and rep(MQ) denote the set of 
predicate variables, the set of literal schemes, and the set of relation patterns occurring in MQ, respectively 
(note that rep(MQ) C ;s(MQ)). Moreover, let re^(DB) denote the set of relation names of DB and ato(DB) 
denote the set of all the atoms of the form p{Ti, . . . , Tk) where p G reZ(DB), k is the arity of p, and each Ti is 
an ordinary variable. 

Semantics is defined via types of metaquery instantiations: an instantiation type specifies how relation 
patterns can be instantiated, turning a metaquery to an ordinary Horn rule over the given database. Next, we 
define three different types of metaquery instantiations, that we call type-0, type-1 and type-2, respectively. In 
the literature, metaquerying semantics has not been always precisely defined, even if a kind of type-2 semantics 
is usually assumed (see, e.g.,pl|). 

Definition 2.1 Let MQ be a metaquery and DB a database. An instantiation (on MQ and DB) is a mapping 
a : rep(MQ) ato(DB), whose restriction a' : pw(MQ) re^(DB) is functional. 

The condition above says that predicate names of MQ are consistently substituted with relation names from 
DB. 

In general, given a set S of literal schemes and an instantiation ct, cr(S') will denote a set of atoms generated 
from S by applying a to each relation pattern belonging to S. Let DB be a database, and MQ be a metaquery. 
In the following, ct(MQ) will denote the Horn rule generated by applying a to each relation pattern of MQ. 
Whenever we want to remark that cr is from rep(MQ) to ato(DB), we will employ the notation cr^^. 

Definition 2.2 Let MQ be a pure metaquery and DB a database. An instantiation cr^^ is type-0 if for any 
relation pattern L and atom A, a{L) = A implies that L and A have the same list of arguments. 

That is, under type-0 semantics, each predicate variable is always matched to a relation with the same arity 
and ordinary variables are left untouched. As an example, consider the database DBi shown in Figure |l| and 
the metaquery 

R{X,Z) ^ P{X,Y),Q{Y,Z) (4) 
A possible type-0 instantiation for MQ is 
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a = Z),UsPT(X, Z)}, (P(X,y),UsCa(X,y)), (g(y, Z), CaTc(r, Z))} 

which yields the following Horn rule when applied to MQ: 

UsPT(X, Z) ^ UsCa(X, r), CaTe(r, Z) 

Definition 2.3 Let MQ be a pure metaquery and DB a database. An instantiation is type-1 if for any 

relation pattern L and atom A, <t{L) — A implies that the arguments of A are obtained from arguments of L 
by permutation. 

With type-1 instantiations, variable ordering within relation patterns "does not matter" . As an example, under 
this semantics, from metaquery and DBi, among others, the following Horn rules could be obtained: 

UsPT(A:, Z) ^ UsCa(X, r), CaTe(y, Z) 
UsPT(A:, Z) ^ UsCa(y, X), CaTe(r, Z) 

The third type of instantiation takes a step further by allowing a relation pattern of arity k to be matched with 
an atom of arity k' , with k' > fc, padding "remaining" arguments to free variables: 

Definition 2.4 Let MQ be a metaquery and DB a database. An instantiation cr^^ is type-2 if for any relation 
pattern L and atom A, <j{L) — A implies the following: 

• the arity k' of A is greater-than or equal-to the arity of L; 

• fc of the arguments of A coincide with the k arguments of L, possibly occurring in different positions; 

• the remaining k' ^ k arguments of A are variables not occurring elsewhere in the instantiated rule. 

With type-2 instantiations we can express interesting patterns ignoring how many extra attributes a physical 
relation may have. Should the relation UsPT be defined with an additional attribute, as in Figure ||, the 
metaquery (^) can be instantiated, using a type-2 instantiation, to 

UsPT(X, Z, T) ^ UsCa(r, X), CaTe(r, Z) 



UsCa 



User 


Carrier 


John K. 


Omnitel 


John K. 


Tim 


Anastasia A. 


Omnitel 



CaTe 



Carrier 


Technology 


Tim 


ETACS 


Tim 


GSM 900 


Tim 


GSM 1800 


Omnitel 


GSM 900 


Omnitel 


GSM 1800 


Wind 


GSM 1800 



UsPt 



User 


Phone Type 


John K. 


GSM 900 


John K. 


GSM 1800 


Anastasia A. 


GSM 900 



Figure 1: The relations UsCa, CaTe, and UsPT of DBi 
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UsPt 



User 


Phone Type 


Model 


John K. 


GSM 900 


Nokia 6150 


John K. 


GSM 1800 


Nokia 6150 


Anastasia A. 


GSM 900 


Bosch 607 



Figure 2: The new relation UsPT 



Note that a type-0 instantiation is a type-1 instantiation where the chosen permutation of relations 's attributes 
is the identity, whereas a type-1 instantiation is a type-2 instantiation, where the arity of the atoms matches 
the arity of the relation patterns they are substituted for. 

Note, moreover, that type-2 instantiations may apply to any metaquery, while type-0 and type-1 instantia- 
tions require pure metaqueries. 

2.2 Plausibility Indices 

In datamining applications, one is generally interested in discovering plausible patterns of data that represent 
significant knowledge in the analyzed data sets. In other words, it is not typically required for a discovered rule 
to entirely characterize the given data set, but a significant subset thereof. This idea is embedded in the usage 
of plausibility indices. In the literature, several definitions of plausibility indices are found, like, e.g., support, 
confidence, base and strength |^, ^ (in fact, "support" is similar to "base" and "confidence" is similar to 
"strength" Q). In the analysis that follows, we shall use support, confidence and another useful index that we 
call cover. We next provide formal definitions of the indices that we use. 

For any set F , let denote its size. In the following, we assume the reader is familiar with basic notions 
regarding relational algebra and Datalog (see for an excellent source of material about these subjects). 
Unless otherwise specified, a predicate name p will denote its corresponding underlying relation as well. In 
particular, an atom p(X) will denote the corresponding database relation, where the list of arguments X is 
used, as in Datalog, to positionally refer to p's columns. For a set of atoms R, att{R) is the set of all the 
variables (attribute) of all the atoms in R, and J{R) is the natural join of all the relations corresponding to 
atoms in R. 

Definition 2.5 A plausibility index, or index in short, is a function which maps a database instance DB and 
a Horn rule h(X.) ^ &i(Xi), 5„(X„) (defined over DB) to a rational number within [0, 1]. 

Given an index I, a database DB, and a rule r defined over DB, unless confusion may arise, we will employ 
the notation I{r) as a shortcut for /(r, DB). 

Definition 2.6 Let R and S be two sets of atoms. Then the fraction of R in S, denoted R1 S, is 
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\J{R)\ 

In particular, whenever \TTau{R){-f{Ii) ^ = 0, S is defined equal to 0. 

Definition 2.7 Let DB be a database, and r be a Horn rule defined on DB. Let h{r) and b{r) denote the sets 
of atoms occurring in the head and in the body of r, respectively. Then 

• the confidence of r on DB is cnf{r) — b(r) | /i(r), 

• the cover of r on DB is cvr{r) — h{r) | b(r), 

• the support of r on DB is sup{r) — inaxQgj(r) ({a} f h{r)). 

Intuitively, sup{r) measures how much the body (or part of it) of an instantiation contains satisfying tuples. 
When an instantiation scores an high support, a pattern search algorithm may conclude that it is worth to 
further consider such an instantiation, because there is at least one relation with a high percentage of its tuples 
satisfying the instantiated body. 

When an instantiation scores a high confidence, we can conclude that a high percentage of the assignments 
which satisfy the body also satisfy the head relation. Hence, confidence tells how much valid the rule is over the 
given database. Given a rule r, the indices cnf{r) and sup{r) are equivalent to confidence and support defined in 
. In |8| , there is also a discussion on the motivations underlying various definition of support and confidence. 

To conclude, cover tells which is the percentage of implied tuples belonging to the head relation. This latter 
index that we define here, is useful in those application where it is necessary to decide if it is worth to store the 
head relation or to compute it in the form of a reasonably matching view, such as in reengineering applications. 

As an example, consider the metaquery 

nx) ^ o{x) 

the (type-2) instantiation (on DBi) 

UsCa(X, Z) ^ UsPt(X, H) 

scores a cover of 1, which tells that the content of the first attribute of UsPt completely encodes the content of 
the first attribute of UsCa. 

In the sequel, the set of plausibility indices {cnf cvr, sup} will be denoted I. 

3 Complexity of metaquerying 

In this section we present several results regarding the complexity of metaquerying. We assume the reader 
is familiar with basic notions regarding complexity classes and, in particular, the definition of the polynomial 
hierarchy Next, we recaU the definition of the classes AC°, IU°, #AC", GapAC° , PAC°, and LOGCFL, 
which will be used in the sequel. 
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3.1 Preliminaries 

Definition 3.1 Let C be a complexity class, then NP'^ is the class of problems decidable in NP using an oracle 
solving a problem in C (see Jsit). 

Definition 3.2 A conjunctive query is a set of atoms {ri(Xi), . . . , r„(X„)}, where Xi,...,X„ are lists of 
variables and/or constants. Let DB be a database instance. The problem of satisfying a conjunctive query 
(Boolean Conjunctive Query satisfaction problem, or BCQ) is the problem of deciding if there exists a sub- 
stitution p for variables in X^,! < i < n such that, for each i, 1 < i < n, ri{p{'Ki)) G DB. The set 
{rj(p(Xi)), 1 < i < n} = {ri{xi), 1 < i < n} is called ground instance of {ri(Xi), . . . ,r„(X„)}. 

The BCQ problem was proved to be AT'-complete in [0. 

Definition 3.3 MAJORITY gates are unbounded fan-in gates (with binary input and output) that output 1 
if and only if more than half of their inputs are non-zero. 

Definition 3.4 A family {C;} of boolean circuits, such that Ci accepts strings of size i, is uniform if there 
exists a Turing machine T which, on input i produces the circuit Ci. {Ci} is said to be logspace uniform if T 
carries out its work using O(logi) space. Define AC^ (resp. TC") as the class of decision problems solved by 
logspace uniform families of circuits of polynomial size and constant depth, with AND, OR, and NOT (resp. 
MAJORITY and NOT) gates with unbounded fan-in |, |, |2|. 

In the following, unless otherwise specified, we will assume to deal with logspace uniform families of circuits. 
The class TC" is of special interest, since it characterizes the computational complexity of such important 
operations as multiplication, division, and sorting, as well as being a computational model for neural nets. TC" 
was also characterized as being the class of languages that arises in several ways from counting the number of 
accepting subtrees of AC° circuits |^ . 

Definition 3.5 For any fc > 0, #AC'j] is the class of functions / ; {0, 1}* N computed by depth k, polynomial 
size uniform families of circuits with +,x -gates (the usual arithmetic sum and product in N) having unbounded 
fan-in, where each value incoming into the circuit can be either constant (where the allowed constant values are 
1 and 0) or being an input value in the form cc^ or 1 — Xi (where the allowed input values are 1 and 0). Let 

#AC° = [jk>o#AC<^ i- 

Thus, ^AC'^ circuits accept the values 1 and as inputs, but they are considered as natural numbers. 

Definition 3.6 GapAC^ is the class of all functions / : {0, 1}* — > N that can be expressed as the difference of 
two functions in #AC" § |. 

Definition 3.7 PAC° is the class of languages {A | 3/ e GapAC°,x^ A ^ f{x) > 0} [|. 
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Proposition 3.8 Under logspace uniformity, PAC"^ = 7U° ||, ||. 

Definition 3.9 LOQCFL coincides witli tlie class of decision problems logspace-reducible to a context free 
language |2^. 

Definition 3.10 Let i7(x, y) be a formula with two free variable lists x and y and let fc be a natural number. 
The counting quantifier C is defined as 

k 

C i/(x,y) = |{y : i?(x,y) is true}| > k 



As an example 



C [y encodes an hamiltonian path on the graph 



y 

is true iff the number of Hamiltonian paths on x is at least fc. A polynomial-bounded version of the counting 
quantifier is defined as follows. 

Definition 3.11 Let £ be a class of languages. A language L' belongs to the class CC iff there exists a language 
L G C, a polynomial time computable function /, and a polynomial p such that 

/(^) 

xeL'<^ C [{x,y)(EL] 
|y|<p(kl) 

The polynomial-bounded counting quantifier embeds the polynomial-bounded existential quantifier as follows: 
A language L' belongs to the class 3C iff there exists a language L e C, and a polynomial p such that 

1 

V 

\y\<p(\x\) 

Definition 3.12 The language 3C-SAT, includes all tuples (i^(xi, X2), to) (where xi and X2 are lists of propo- 
sitional variables) such that is a boolean expression in conjunctive normal form and 

1 m 



eL'^ C [{x,y)eL] 



CC F(xi, X2) is true 



Theorem 3.13 3C-SAT and 3C-3SAT are 3CP-complete 

Let PP be the set of languages £ recognized by a nondeterministic Turing Machine M as follows: a; G £ iff at 
least one half of the computation paths of AI{x) ends in an accepting state. It is known that 3CP coincides 
with NP^^ ll^. 3CP belongs to the so called polynomial counting hierarchy: it is contained in PSPACE, and 
(may be properly) contains and CP js^. 
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Definition 3.14 A counting Turing Machine (CTM) CM is an ordinary nondeterministic Turing Machine 
whose output, on the input x, is the number of accepting computations of CM[x). The time complexity of a 
CTM is defined to be f{n) if the longest accepting computation associated with the set of all inputs of size n 
takes /(n) steps (see |p^). 

Definition 3.15 Define jj^P to be the set of all functions that are computable by polynomial-time CTMs. 

Definition 3.16 A problem X is #P-hard if there are polynomial-time Turing reductions from all problems 
in #P to X. If X is #P-hard and X G #F, then X is said to be #P-complete (see [|o|). 

Definition 3.17 A parsimonious transformation p9| ] is a polynomial transformation / from a problem X to 
a problem Y such that, if #(A", x) is defined to be the number of solutions that instance x has in problem X, 
then#(X,a:) =#(r,/(x)). 

3.2 Complexity Measures 

As for the case of ordinary queries, complexity of metaqueries can be defined according to two complexity 
measures: combined complexity and data complexity |3^ . Complexity measures are defined next. 

Let T G {0,1,2} be an instantiation type and let J be a plausibility index. Let DB denote a database 
instance, MQ a metaquery, and k a rational threshold value such that < fc < 1. Then: 

1. The combined complexity of (DB, MQ, /, A;, T) is the complexity, measured in the size of DB, MQ and 
/c, of deciding if there exists a type-T instantiation such that /((7(MQ)) > k. 

2. Assuming that a database schema DS = {D, i?„) has been fixed in advance, the data complexity of 
the metaquery problem (DB, MQ, /, /c, T) is the complexity, measured in the size of DB, of deciding if 
there exists a type-T instantiation cr^^ such that /(cr(MQ)) > fc, where DB is a database with schema 
DS. 

In the literature it is usually assumed that one looks for rules that satisfy some metaquery and have two 
plausibility indices, usually support and confidence, above some given threshold Here we split the 

metaquery problem so that it relates to one plausibility index at a time. The rationale is that, this allows us to 
single out more precisely complexity sources and, at the same time, complexity measures for problems involving 
more than one index can be obtained fairly easily from metaquerying problems involving only one index. 

3.3 Combined Complexity 

It immediately follows from the proof of Theorem 2.2 in [|| that the combined complexities of (DB, MQ, sup, 0, 1) 
and (DB, MQ, cnf, 0, 1} are NP-hard. In this section, we generalize these results by stating the combined com- 
plexities of metaquerying for various indices and instantiation types. 
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A first characterization of tlie probiem is given by tlie following result. 

Proposition 3.18 Let I be an index, and C he the complexity of deciding the following question: "Given a 
Horn rule r, a database DB and a finitely represented rational value k G [0, 1), is I{r) > k over DB ?". Then, 
for any instantiation type T , the complexity of (DB, MQ, /, k, T) is in NP^ . 

Proof. Simply note that the evaluation of an instance of (DB, MQ, /, fc, T) can be done by guessing an instan- 
tiation cr^^, and then calling a C oracle to decide if /(cr(MQ)) > k. □ 



In many cases, we are able to state tighter bounds than those implied by Proposition 3.1^ We begin by 



analyzing the cases when the threshold value k is set to 0. This kind of metaquerying problems have a lim- 
ited practical usefulness. However, they provide interesting lower bounds for the complexity of metaquerying 
problems involving thresholds larger than 0. 

Definition 3.19 Let DB be a database, and r = 6i(Xi) ^ 62 (X2), 6„(X„) be a Horn rule defined over DB. 
Let / be an index, and let Ar — {&i(Xi) ^ 62(X2), 6„(X„)} the set of atoms occurring in r. A certifying set 
for / is a set 5 C such that there exists a ground instance s of S* which is satisfied in DB iff I{r) > 0. 

Proposition 3.20 Consider a rule r = /i(X) ^ 6i(Xi), 5„(X„). Then, Scvt = &i(Xi), 6„(X„)}, 

Ssnp = {^i(Xi), 6„(X„)} and S^,^^ = S'cvr are certifying sets for cvr, sup and cnf, respectively. 

Proof. As for cover, consider a rule r = /i(X) ^ &2(Xi), ...,6„(X„) defined over a database DB. Since cvr{r) 
is defined as 

Ittx (fe(X) [XI 61 (Xi) [XI ... [XI b„(X„)) 
\h{X)\ 

It is immediate to note that /i(X) [X 6i(Xi) [X ... ixi 6„(X„) is not empty iff there exists a ground instance 



of {/i(X), 6i(Xi), 6„(X„)} which is satisfied with respect to DB (cf. definition 3.2). Proofs concerning 



confidence and support are similar. □ 

Theorem 3.21 Let / G I &e a plausibility index. The combined complexity of (DB, MQ, /, 0, T) is NP- 
complete, for any instantiation type T G {0, 1, 2}. 

Proof. Hardness. We use a reduction from the graph 3-COLORING problem that is, given a undirected 
graph G = {V,E) and three colors {1,2,3}, is it possible to assign a color to each node of G, such that no 
adjacent nodes have the same color? 

The reduction uses a database DB3C0/ and a metaquery MQg^^;. D'B^coi includes a single binary relation 



e = {(1,2), (1,3), (2, 3), (2,1), (3,1), (3, 2)} 
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denoting all the possible ways of correctly coloring two adjacent nodes. In MPg^.^; we use ordinary variables 
Xu, one for each node u E V. MQg^g; contains the set of literals S = {E{Xui , Xu2), ■ • • , E{Xu^^_-^ , X^^^^ ) } 
where {(mi,u2), . . . , {u2k-i,U2k)} is the set of edges of G, that is, S encodes G as a set of literals. M-Q^^^i is 
as follows: 



E{Xui,Xu2) ^ E{Xui, Xu^), ■ ■ ■ ,E{Xu2k-l,Xu2k)- 

Claim 3.22 G has a 3-coloring iff for any type T , there exists a type-T instantiation cr^^^""' such that (t(MQ) 
has a ground instance s which is satisfied with respect to DBgco/. 



Proof. (Claim 3.22 ) (-^). Suppose that G has a 3-coloring c :V ^-^ {1, 2, 3}, defined for each u , and such 
that if e E then c{u) ^ c{v). To show this direction, it is enough to consider an instantiation a which 

maps each literal in the form E{X,Y) to an atom of the form e{X,Y). a is type-0 and, therefore, also type-1 
and type- 2. 

(«— ). Since S is an encoding of the edges of G, G is undirected and e is the only relation in DBgco;, s being 



true over DBsco; implies that G is 3-colorable. This closes the proof of Claim 3.22. □ 



The hardness part of the proof is completed by noting that by Proposition 3.20 and by the definition of 
MQg^Qj, by defining an instantiation a which maps E to e, a{S) is a certifying set for cvr, sup, and cnf 
Membership. Assume / — cvr. The proof consists in providing evidence of the existence of a succinct certifi- 
cate for the problem In order to check if (DB, MQ, /, 0, T) is a YES instance, we guess an instantiation 
^DB^ of type T, and a substitution p for ordinary variables and check if p((t(MQ)) is true in DB. By defi- 



nition 3.19, (o(ct(MQ)) is true in DB iff /(ct(MQ)) > 0. Proofs concerning support and confidence are similar. □ 



Remark. Our proof outlines a general technique which can be employed to prove analogous results for all 
those indices / having the following properties: 

• Given a rule r = /i(Xo) ^ 5i(Xi), . . . , 6„(X„) a certifying set Si can be built in polynomial time, and 

• 0{\Si\) > 0{ni), for some fc > 1. 

Next, we consider the combined complexity of metaquerying when the fixed threshold k is such that < k < 1. 

Proposition 3.23 The combined complexity of (DB, MQ, /, k, T) is NP-hard for any plausibility index / G I. 

□ 



Proof. Immediate from Theorem 3.21 



Theorem 3.24 The combined complexity of (DB, MQ, /, fc, T), with < fc < 1, T e {0, 1, 2} and I G 
{cvr, sup} is in NP. 



12 



Proof. Consider an instance of (DB, MQ, cur, fc, T). /((7(MQ)) has the form where B is the head relation 
of ct(MQ) and A is the join of the body and the head of cr(MQ) projected on the attributes of the head 
relation, for cr^^ a given instantiation of type-T. In order to verify if the given instance is a YES instance, 
it is then sufficient to guess a type-T instantiation and verify that, over DB, \A\ > [fc|_B|J. k\B\ can be 

computed in polynomial time. A succinct certificate for the problem at hand consists of the instantiation ct, 
and of [fc|-B|J + 1 substitutions pi of ordinary variables occurring in (7(MQ) distinct from one another as far as 
head ordinary variables are concerned, such that cwr(tT(MQ)) > A; is proved. This can be checked in polynomial 
time. Checking distinctness of the [fc|i?|J + 1 substitutions can obviously be also done in polynomial time. 

Consider an instance of (DB, MQ, swp, A:, T). /(cr(MQ)) has the form maxilj^}, where Bi is a relation 
in the body of o'(MQ) and Ai is the join of the body of o'(MQ) projected on the attributes of Bi, for cr^^ a 
given instantiation of type-T. The proof follows the same line of reasoning as above, with the difference that, 
here, the certificate must also include an index j such that > k. □ 

As for confidence, it is not difficult to show a PSPACE upper bound on its combined complexity and we can 



deduce, from Theorem 3.21, that the problem is AT'-hard as well. 

A more in-depth analysis shows that the issue of measuring confidence of a given metaquery instantiation 
turns out to be actually more complex than to measuring the other indices. This is due to the need of computing 
the exact count of tuples satisfying the body of an instantiation, whereas for other indices this is not required. 
In fact, the problem of deciding if confidence exceeds a given threshold on some instantiation of a metaquery is 
related to problems where the question concerns counting the exact number of solutions of a given instance, as 
we show below. 

Given a boolean formula F (resp. a boolean formula F with at most three literals per clause) in conjunctive 
normal form, let ^SAT (resp. ^3SAT) be the problem of finding the number of satisfying assignments for F. 

Theorem 3.25 i^SAT and ^iSAT are jj^P-complete || 

Many problems are known to be #P-complete |l3|, some of which are the counting counterparts of NP- 
complete problems. However, it is also known that counting solutions for some of the problems in Pis as hard as 
counting solutions of AT'-complete problems PQ] . A useful tool to show #P completeness for counting versions 



of AT'-complete problems is that oi parsimonious transformations (defined in Definition 3.17) 



Proposition 3.26 Let Q he a conjunctive query {ri(Xi), . . . ,r„(X„)}, where Xi, . . . , X„ are list of variables 
and/or constants. Let #BCQ be the problem of counting how many substitutions p for variables X.;, 1 < i < n, 



are there such that, for each i, 1 < i < n, p(rj(Xi)) G DB (cf. Definition S.i). #BCQ is =fl=P- complete. 



Proof. We show a transformation from 3SAT to BCQ that preserves the number of solutions. Let F = 
(xii V xi2 V 2:13) A • • • A {xni V Xn2 V Xns) be a 3SAT instance, where the 's {1 < i < n, 1 < j < 3) are (not 
necessarily distinct) literals. We build a conjunctive query Q — ci{Xii, X12, A'13), . . . , c„(Ar„i, Xn2, Xnz) and a 
database DB = ci, . . . ,c„ as follows: each variable Xij of Q is associated with the literal (independently 
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of Xij being negative or not). As for DB, each relation Ci G DB is ternary, and the set of constants of DB is 
U — {0, 1}. Let di — xi\/ X2\/ X3 be the i-th clause of F. We fill q with the tuples corresponding to satisfying 
assignments of di, i.e. 



Q = C/^-{(dl,d2,d3)} 

where each dj {j £ {1,2, 3}) is if Xj is positive, and 1 otherwise. Of course, the total number of tuples included 
into Ci is constant. It is then immediate to see that the number of satisfying assignments for F coincides with 
the number of satisfying substitutions for Q in DB. □ 



Theorem 3.27 The combined complexity 0/ (DB, MQ, cnf, fc, T) is in NP 



pp 



Proof. We follow the same line of reasoning as in the proof of Theorem 3.24, but here we take advantage of 



a #BCQ oracle. cnf{a{'WlQ)) has the form where B is the join of all body atoms of cr(MQ) and A is the 
join of the body and the head of (t(MQ) projected on the attributes of the head relation, for cr^^ a given 
instantiation of type T. In order to verify if the given problem instance is a YES instance, it is then sufficient 
to guess a type-T instantiation cr and verify that, over DB, \A\ > [fc|i?|J. A #BCQ oracle can be queried for 
values of \B\ and \A\. A succinct certificate for the problem at hand then consists of the instantiation ct, and of 
values a for \A\ and b for \B\, such that cn/((T(MQ)) > k. □ 

Theorem 3.28 The combined complexity 0/ (DB, MQ, cnf, fc, 0} is NP^^ complete. 



Proof. (Membership). It is known that #P is as powerful as PP when employed as an oracle, hence NP 



pp 



NP"^^ lO- Membership then immediately follows from Theorem 3.27. (Hardness) We show a reduction from 
3C-3SAT to (DB, MQ, cn/, fc, 0). 

Suppose an instance of EIC-3SAT is given, i.e.: 

• a formula F — Ci, where each Ci is a three- literal clause U-^ V V (with each li. £ {pi, . . . ,ps, 
qi, ...,qh, -^Pi, ■ ■ ■,^Ps, -'(71, • . • 

• a partition of variables of F into two sets 11 — {pi, . . .p^} and x = {qi-, ■ ■ ■ Ih} and, 

• an integer fc', 

and the question is the following. Is there an assignment for variables pi, ■ ■ ■ ,Ps such that at least fc' assignments 
for variables qi, . . . ,qh are such that F is true? 

We build a database DBcsat and a metaquery MQ^^^j as follows. The domain of DBc^at is the set of 
symbols {1,0, ^}. DBc^at contains: 

• a relation p'^(X, X, Y) — {(1, 0, 1)} and a relation p^{X, X, Y) — {(0, 1, 1)}, intended to deal with variables 
within 11; 
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• a relation q{X, X) = {(1, 0), (0, 1)}, intended to deal with variables within x; 



• a relation 



c'(il,i2,L3,C) 



{(1,0,0,1), 
(0,1,0,1), 
(0,0,1,1), 
(1,0,1,1), 
(1,1,0,1), 
(0,1,1,1), 
(1,1,1,1), 
(0,0,0,0)} 



which encodes the possible ways of assigning boolean values to clause literals and, then, to the correspond- 



• the relation c(Ci, . . . , C„) = {(1, . . . , 1)}, intended to select satisfying boolean assignments only. 

As for MQ^^^j, we introduce, for each boolean variable pi (resp. qi) of 11 (resp. x), two variables Pi and Pi 
(resp. Qi and Qi). MQ^^^j is as follows: 



where each Li. { 1 < i < n,l < j < 3), will be 

• either Py if li^ = Py and Py belongs to 11, or Py if li^ = -ipy and Py belongs to 11. 

• either Qy if li. = qy and qy belongs to x, or Qy if li. = -^qy and qy belongs to x- 

Finally, we set k equal to ^p^- 

As an example, suppose F = (a V 6 V e) A (-^a V e V rf), 11 = {a, b} and x = {d, e}. In this case, MQ^^^j is 



Roughly speaking, this reduction works as follows: the generic relation c'(_, _, _, Ci) expresses all the possible 
value assignment to variables of a clause c,, and the corresponding patterns within MQ^^^j encode the structure 
of F; the predicate variables P/ guess a value, either true or false for each literal within 11, whereas, through 
the atoms with q as the predicate symbol, we encode, in order to count them, all the possible assignments of 
literals within x- In order to exclude additions to the confidence value determined by instantiations which map 
some predicate variable Pj to q, and p^ both have arity equal to three. The join of the body atoms, for 
a given instantiation a, "computes" the possible assignments of the variables Qi's for a fixed configuration of 



ing clause; 




c(Ci,C2)<-A' 



(A, A, Y), B'{B, B, F), q{D, D), q{E, E), c'{A, B, E, C^), c' 



{A,E,D,C2) 
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the variables Pi's; when the body of cr(MQg^(jj) is joined with the head atom, the result set captures those 
assignments on the Qi's which make F true. Confidence will exceed k = iff there are at least k' assignments 
for QiS which make F true. 

More formally, suppose that {F,k' ,Ii,x) is a YES instance for 3C-3SAT. Consider a truth assignment p 
over variables pj (1 < j < s) which makes {F, k', 11, x) a YES instance. An instantiation ct^^^°'^ which makes 
cnf{a{'M.Q^^gi)) > ^^^^ can be built by setting, for each pj {1 < j < s), Pj to p", if p{pj) = true and setting Pj 
to p^ otherwise. The join J of body atoms of (T(MQ^g^j) contains precisely 2^ tuples. Moreover, if there are at 
least k' satisfying assignments for variables within x, joining the head relations with J will result in a relation 
Jh (which "selects" only those tuples of J such that each d variable is set to 1, i.e. the clause Cj is satisfied) 
containing at least k' tuples. Hence cnf{a{'M.Q^g^^)) > k. 

Conversely, suppose that we have an instantiation cr^^^^j^' for which 

cn/(a(MQ_,)) > 

Obviously, the predicate variables Pi (1 < z < s) are mapped in cr to either p° or p^. In order to build an 
assignment p which makes {F, k' , 11, x) a YES instance, let p{pi) = true if, within a, Pi i-^ and p{pi) = false 
if Pi p^. Observe, then, that the join J of body atoms of f7(MQ,,,„j) has precisely 2'* tuples. To have 
cnf{a(MQ^g^f)) > ^^^r-, the join Jh of body and head atoms of a(M.Q^g^^) must have at least k' tuples, each of 
which represents one of the k' assignments for the variables within x that make F true. This closes the proof. 
□ 

Theorem 3.29 The combined complexity o/ (DB,MQ, cnf, fc, 1) and (DB, MQ, cnf, fc, 2) is NP^^ complete. 

Proof. The proof works using a slightly different reduction than the one used in the proof of the previous 
theorem, because here we have to deal with ordinary variables permutation/addition through instantiations. 
Suppose an instance of 3C-3SAT is given, i.e. 

• a formula F = Ar=i where each Cj is a three literal clause Zj^ V Zj^ V li^ (with each li. e {pi, ■ ■ ■ ,Ps, 
qi,...,qh, -^pi,...,^Ps, -^qi,---,^qh}, 

• a partition of variables of F in two sets 11 = {pi, . . .ps} and x = ■ ■ - Qh}, 

• an integer k' . 

We build a database T>Bcsat and a metaquery MQ^^^j as follows. DBc^at contains: 

• a relation p(X,X,F) = {(1,0,0); 

• a relation q{X,X) = {(1,0), (0, 1)}; 

• a relation ch{Y) = {{I)}; 
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• a relation 

{(1,0,0,1), 
(0,1,0,1), 
(0,0,1,1), 
(1,0,1,1), 

c'{LuL2,L3,C) = 

(1,1,0,1), 
(0,1,1,1), 
(1,1,1,1), 
(0,0,0,0)} 

• the relation c(Ci, . . . , C„) = {(1, . . . , 1)}. 

As for MQ^g^j, we introduce for each boolean variable (resp. Qi) of F, two variables P, and P, (resp. Qi and 
Qj), and MQ^^^^ is as follows: 

c(Ci,...,C„) ^P'{P,,Pi,Y),...,P'{P,,P,,Y),ch{Y),q,iQ,,Q^),...,qh{Qh,Qh), 

, L13 , C'l), . . . , c' (Ln^, Ln^, Ln^, Cn)- 

where each L^. ( 1 < i < n, 1 < j < 3), will be 

• either Py if li. = py and Py belongs to 11, or Py if li. = ^Py and Py belongs to 11. 

• either Qy if k. = qy and qy belougs to x, or Qy if U- = -^qy and qy belongs to %• 

Finally, wc set k equal to ^^^i^- 

Suppose that (P, fc',n, x) is a YES instance for EIC-3SAT. Consider an assignment p defined over variables 
Pj (1 ^ j ^ s) which makes (P, fc',n, x) a YES instance. An instantiation f^^^' ""^' (cither type 1 or 2), which 
makes cnf{(T{'M.Q^g^^)) > -^^tt" can be built by mapping P' to p, and mapping, for I < j < k, Pj to the first 
attribute of p, if p{pj) ~ true, or, otherwise, to its second attribute. The join J of body atoms of (t(MQ|,j,^j) 
will have precisely 2'* tuples. Moreover, if there are at least k' assignments for variables within x, joining the 
head relations with J will result in a relation Jh (which selects only those tuples of J with each Cj variable set 
to 1, i.e. the clause Ci is true) containing at least k' tuples. Hence cnf{a(M.Qi,g^^)) > k. 

Now, suppose that we have a type-1 (or type 2) instantiation a for MQ^.^^^ for which 

CnM^Qcsat)) > 

(observe that the considerations that follow does not hold for type-0 instantiations). Note that the predicate 
variable P' is necessarily mapped to p, indeed the only alternative instantiations for P' associate it to c or to c'; 
but, in this case, the empty relation is produced when an atom like, say, c' {Pi, Pi, Y) (produced from a relation 
pattern P{Pi,Pi,Y)) is joined to ch{Y). 

In order to build an assignment p which makes (P, k' , If, x) a YES instance, let p{pi) = true, if Pi is mapped 
to the first attribute of p, and p{pi) — false if Pi is mapped to the second attribute of p. Observe that it cannot 
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be the case that Pi is mapped to the third attribute of p, because, otherwise, the resulting natural join with 
ch{Y) (e.g. p{Pg, Y, Pg) [XI c/i(F)), would be empty. Observe then, that the join J of body atoms of ct(MQ^3q() 
has precisely 2^ tuples. To have cn/(cr(MQ^g^()) > ^5^' J'^i^ of body and head atoms must have at 
least k' tuples, each of which represents one of the k' assignments for the variables of x that make F true. This 
completes the proof. □ 

3.4 The complexity of acyclic metaqueries 

We have shown above that, as far as the combined complexity measure is concerned, metaquerying is intractable. 
Next we discuss some tractable subcases. We recall some definitions first. 

Definition 3.30 A Hypergraph H = {V, E) is a set V of vertices, and a set i? C 2^ of (hyper)edges. An ear 
for a hypergraph H — {V, E) is an edge e E E such that for some distinct edge w ^ E, called the witness of e, 
no vertex of e — w is in any other edge. We say that H is acyclic if its derived hypergraph GYO{H) is empty, 
where GYO[H) is built by applying the following steps until there are no ears (see Q) : 

1. remove from E all isolated edges, i.e., edges sharing no vertex with other edges. 

2. choose an ear e of H. 

3. remove e from H by deleting it from E and by deleting from V vertices of e not appearing elsewhere in 
E. 

The following notion of acyclicity for metaqueries is a little bit different from the usual one employed for ordinary 
conjunctive queries [Q. In particular, the hypergraph associated with a metaquery will contain both ordinary 
and predicate variables as nodes. 

Definition 3.31 Let s be an atom, a relation pattern, or a metaquery. The set of (both predicate and ordinary) 
variables of s is denoted var{s), whereas the set of ordinary variables of s is denoted varo{s). Let MQ be a 
metaquery, define the hypergraph i/(MQ) — {V,E) associated with MQ as follows. V — war(MQ), whereas 
E contains an edge — war(i?i(Xi)) for each literal scheme i?i(Xi) in MQ. We say that MQ is acyclic if 
if (MQ) is acyclic. Similarly, define the semi- hypergraph (MQ) ~ {V' , E') associated with MQ as follows. 
V' — waro(MQ)), E' contains an edge — varo{Ri(X.i) for each hteral scheme Ri(Ki) in MQ. We say that 
MQ is semi-acyclic if S'iJ(MQ) is acyclic. 

For example, the metaquery 

MQi = P{X, Y) ^ P{Y, Z), Q{Z, W) 
is acyclic, whereas the slightly different metaquery 

MQ2 = P(X, Y) ^ Q(y, Z), P(Z, W) 
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is cyclic. Finally, the metaquery 

MQi ^ N{X) ^ N{Y), E{X, Y) 

is semi-acyclic, but it is not acyclic. It is straightforward to show that an acyclic metaquery is semi-acyclic as 
well. 

Theorem 3.32 Let MQ be an acyclic metaquery and I £ 1 a plausibility index. The combined complexity of 
(DB, MQ, /, 0, 0) is LOGCFL- complete under logspace reductions. 

Proof. Hardness. The satisfiability problem for acyclic conjunctive queries was proved hard for LOGCFL in 
[ p^ . Thus, let Q = Qi, . . . , Q„ be an acyclic conjunctive query, where each Qi is an atom. The (completely 
instantiated) metaquery 



mqiQ) ^ Qi ^ Qi,Q2, ■ ■ ■ ,Qn (5) 



is acyclic. By Proposition 3.20| , the form of mq{Q) is such that the BCQ problem on Q can be solved by showing 



a positive value of either cover, support or confidence for mg(Q). 

Membership. MQ can be easily reduced, in logspace, to an acyclic conjunctive query Qmq, defined on a new 
database Ddb- Let r be a relation, we denote by a(r) the corresponding arity. The reduction is as follows: 

• For each relation name r in DB we introduce a new constant value n^; 

• For each arity a of some relation of DB, we introduce a relation Ua, of arity a + 1. The extension of Ua in 
Z?DB is such that if i = (ti, ta) is a tuple belonging to a relation r of arity a then the tuple (n^, ti, ta) 
is in Ua. 

• Assume that MQ is of the form 

T{Ti, ■.,Ta(T)) ^ Li{Xl, ..,L„i{X^, .., X™^^^^) 

then if / 7^ sup, we set Qmq to 

UaiT){T,Ti, ..,Ta(T)),Ua{Li){Ll,Xl, . . . , Ma(L„) (Lm, : ■•'^^(L™))- 

If / = sup, we set Qmq to 

Ua(Li){Ll,Xl, ...,Ua(L^){Lni,XY\ ..,X^'^^^^). 

It is easy to see that an instance of (DB, MQ, /, 0, 0) evaluates to true iff Qmq has a non-empty answer over 
Dub- □ 

However, acyclicity is not sufficient to guarantee tractability in general, as shown next for instantiation types 
other than type-0. 
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Theorem 3.33 Let I E 1 be a plausibility index and T G {1,2} and let MQ be an acyclic metaquery. The 
combined complexity o/ (DB, MQ, /, 0, T) is NP-complete. 



Proof. Membership follows from Theorem |3.21| . As for hardness, we show that we can reduce the NP- 
complete problem HAMILTONIAN PATH [|l§ to (DB, MQ, /, 0, T). An instance of HAMILTONIAN PATH 
deals with an undirected graph G = {V,E), and asks if G contains a Hamiltonian path, i.e. a path touching 
each node in V exactly once. Without losing generality, we can assume that \V\ > 2. 

We build a database D'Bham and a metaquery MQ^^„. 'D'Bham contains a relation g, with a single tuple 
encoding node names, say t — (ui, . . . , w„), and a binary relation e, storing one tuple for each edge in E. The 
metaquery MQ;,^„ is 

iV(Xi,..,X„) ^ Ar(Ai,..,X„),e(Ai,X2),..,e(A„_i,A„) 
Intuitively, we use N to select a permutation of nodes of G and the body of MQ encodes the constructed 
Hamiltonian path. Now, suppose an Hamiltonian path p = , ■ ■ ■ , Vu„) exists: we can build from p the 
following set of ground atoms: 

{g{vu,,...,VuJ,e{ ),..., e(u„„_i,t;„,J} 



which is satisfied over 'D'Bham- By Proposition 3.20, this certifies that (DB/iam, MQ^^„, /, 0, T) is a YES 



instance, for any / G I. Similarly, observing that N can suitably match only with g (both for type-1 and for 
type-2 instantiations), a YES-certificate for (DB/iam, MQ^^^j^, /, 0, T) allows us to build an Hamiltonian path 
of G. Moreover, MQ is acyclic: take {N,Xi, . . . , A"„} as witness for {Xi, Xi+i}, 1 < i < n — 1. □ 



Theorem 3.34 Let / G I &e a plausibility index and T G {1,2} and let MQ be an acyclic metaquery. The 
combined complexity o/ (DB, MQ, sup, fc, T) and (DB, MQ, cvr, fc, T) is NP-complete. 



Proof. Membership follows from Theorem 3.24, whereas hardness is by reduction from (DB, MQ, sup, 0, T) to 



(DB, MQ, sup, k, T) and from (DB, MQ, cvr, 0, T) to (DB, MQ, cvr, k, T), respectively. □ 

One can ask if disregarding predicate variables, and thus referring to semi-acyclic metaqueries, instead of 
considering acyclic ones would be sufficient in order to give a polynomial evaluation algorithm as far as type-0 
metaqueries are concerned. The next result shows that, unfortunately, the evaluation of semi-acyclic type-0 
metaqueries is not simpler than evaluating general metaqueries. 

Theorem 3.35 Let I £ 1 be a plausibility index and let MQ be a semi-acyclic metaquery. The combined 
complexity of (DB, MQ, /, 0, 0} is NP-complete. 



Proof. Membership follows from Theorem 3.21. As for hardness, we proceed by reducing the A^P-complete 



problem 3-COLORING Q to the problem of evaluating type-0 semi-acyclic metaqueries. Let G — {V, E) 
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be an undirected graph. We build a database DBsco; and a metaquery MQgj.^; as follows. DBaco; con- 
tains three binary relations r\g' and with the following extensions: r'{X^Y) — {{g,r), {b,r)}, g' {X,Y) — 
{{r,g},{b,g)},b'iX,Y) = {{g,b),{r,b)}. 

In MQg^Qj we use ordinary variables Xu and predicate variables X[^, one for each node u ^ V. We denote 
by "_" a new variable not occurring elsewhere in MQg^^; (a mute variable). 

Let E — {{ui,vi), . . . , {u„i,Vm)}, and V — {zi, . . . ,Zn}. Then MQg^^; contains both the following sets of 
literals 



S" 



Intuitively, S' has the same purpose as in Theorem B.21, that is, it encodes G as a set of literals. The atoms 
of S" will force each variable X'^. to represents the same color of the corresponding X^. variable. MQg^Q; is as 
follows 



X'uii^vi^-) ^ -^uil-'^flJ-): ■ • ■ ^ XL^{Xv„,, -), X'^^{_, Xzi), ■ ■ ■ ,X'^^{_,Xz^) 

As an example, let G = ({1, 2, 3}, {(1, 2), (3, 1), (3, 2)}}. In this case, MQg^^, is 

X[{X2,-)^ X{ {X2 , -) , (Xi , _) , (X2 , _) , X( (_, ) , (_, X2 ) , (_, Xg) 

Note that, in general, MQg^^; might not be acyclic, but it is semi-acyclic. In fact, let SH(M.Q-^^gi) 
{Hy,He) be the semi-hypergraph associated to MQg^^;. Hy contains: 

• the set A of variables Xz^ , . . • , X^^ ; 

• a set $ of mute variables (jji, . . . , 0m, one for each atom of S"; 



a set ^ of mute variables V'l , • ■ • , V'n , one for each atom of S' 



He is a set of hyperedges either of the form {Xi, (pj), or of the form {tpi, Xi). Since variables of ^' and variables 
of $ appear in at most one hyperedge, and there is no hyperedge sharing two different variables of A, we can 
conclude that SH(M.Q^^gi) is acyclic, and hence MQg^^; is semi-acyclic. 



Similar to the proof of Theorem 3.21, we conclude this proof by showing the following: 



Claim 3.36 G has a 3-coloring iff there exists a type-0 instantiation cr^^^'"'^ such that cr(5') has a ground 
instance s' and <j{S") has a ground instance s" such that s' U s" is true in DB. 

(-^). Suppose G has a 3-coloring c : V i-^ {r,g,b}, defined for each u G V, and such that if {u,v) e E then 
c(u) ^ c(f ). Let c' : y I— > {r', g' , b'} be a coloring isomorphic to c. Consider then an instantiation a which maps 
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each literal in the form X'^^{Xy,(l)) to an atom in the form c'(w)(X„,0), and each atom in the form X'^^^fj^Xz) 
to an atom in the form c'{z){il), X^), where u' G {r' , g' , b'}. A ground instance s' U s" of variables for a{S' U 5") 
which is true in DB is 



c'{ui){c{vi),c{ui)), . . . ,c'(M„i)(c(Wm),c(M,„)),c'(zi)(/3i,c(zi)), . . . ,c'{z„)if3n,c{Zn)) 
V ' " V ' 

s' s" 

where constants /3i , . . . , /?„ are chosen accordingly. 

(<— ). Suppose that there is an instantiation a of type for MQj^q; that makes a given ground instance 
s' U s" of a{S' U S") true in DBsco/- cr maps each predicate variable X'^ to either r', g' or h' . s' U s" has the 
form 



V ' V ' 

s' s" 

where each constant value Cz^^Cvj (1 < j < ti), (1 < i < n) belongs to {r,(7,6}, and each constant c'^. or c^., 
(1 < j < to) (1 < i < n), belongs to {r',g',6'}. Consider now the following mapping: 

Ms'us",s'vjs" = {X'^^{Xy^,(j)i) ^ c^^(ci,i,c„J, . . . ,X'^^{X^^,(j)^) ^ c'„^(ci,,„,c„„), 
(^1 , X, J ^ 4^ (/3i , J , . . . , JV^„ , X, J ^ 4 J/?„ , c,„ ) } 

from S" to s' U s". Note that if a constant c„ is r then c„ is r', whereas if a constant Cu is g (resp. b) then c'^ 
is g' (resp. b'). Since it cannot be the case that c^^ ^ c„j for an atom c'^^ (ct,j , c„j) g s', we can build a valid 
3-coloring c : V ^ {1,2, 3} by setting c{zi) — Cz^ for each Zi G V. This closes the proof of the Claim. We can 



then resume the proof of Theorem 3.35 



The Theorem follows by noting that MQscoi is built in order for a{S' U S") to be a certifying set for cvr, sup 
and cnf, for any instantiation u^^^""' , and, therefore, cvr{a) > 0, sup{a) > and cnf{a) > iff claim 3.36 



holds. □ 



3.5 Data complexity 

In this section we discuss data complexity of metaquerying problems. Similarly to most query languages, the 
data complexity is much lower than the combined complexity. In particular, in some interesting case, it lies 
very low in the complexity hierarchy, as proven next. 

Theorem 3.37 Under the data complexity measure (fixed metaquery and threshold value, variable database), 
(DB, MQ, /, 0, T) is in AC° , for I el and for T G {0, 1, 2}. 

Proof. Under the data complexity measure, number of type-T instantiations for MQ and DB is constant. 

We can build a circuit solving the instance (DB, MQ, /, 0, T) at hand as follows: Let be a generic 

instantiation of MQ; let ct(MQ) ~ Qi ^ Q2, . . . , Qn, where the Qi's are atoms; let g'(CT(MQ)) and q"{a{'M.Q)) 
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denote the boolean conjunctive queries Q' = Qi, . . . , Qn and Q" = Q2, ■ ■ ■ , Qn, respectively. By Proposition 
|3.20 , for / G {cnf, cvr} (resp. / — sup), /(ct(MQ)) > iff Q' (resp. Q") is satisfiable in DB. Now, it is known 
(see 1^) that any conjunctive query Q is solved by a constant-depth polynomial size logspace- uniform family of 
boolean circuits, call it {C(Q)i}, where, for each i, C(Q)i solves Q when the input database instance has size i. 
Let be the set of all type-T instantiations for the metaquery at hand. For a fixed database size i, consider the 
circuit C(MQ)i obtained by connecting the outputs of all the circuits C{q' {(T(M.Q))i (resp. C{q" {a(M.Q))i), 
with (7 e Yi^ through an OR gate. Since is polynomial, C(MQ)i has constant depth and polynomial size. 
Hence the result follows. □ 



In the general case, the data complexity of metaquerying is within TC*^. 

Theorem 3.38 Let T G {0,1,2}. Let MQ be a metaquery, DB be a database, / G I a plausibility index, and 
< k < 1. The data complexity of the metaquerying problem (DB, MQ, /, /c, T) is in TC'^ . 

Proof. Any project-join expression Q is solved by a constant-depth polynomial-size logspace-uniform family 
of multiple-output circuits of unbounded fan- in AND, OR, and NOT gates (see [Q); call the generic circuit of 
this family {C"(Q)i}, where, for each i, C'{Q)i calculates the output of Q when the input database instance 
has size i. In particular each C"(Q)i has Mi boolean outputs, one for each tuple potentially in the result of Q. 
Note that, under data the complexity measure, for each database size i. Mi is polynomial in i. 
Let Cj (Q)i be the subcircuit of C"(Q)i deciding if the jth tuple in the potential result of Q belongs to its 
output, for j = 1, . . . , Mi. Since every language in has its characteristic function in #AC° there exists 
a #ylC" circuit C"(Q)i equivalent to Cj(Q)i, for each j = 1, . . . ,M,. 

Let {count{Cl)i] be the family of circuits obtained connecting the outputs of the circuits C"(Q)i, for each 
j = 1, . . . ,Mi, to a single -I— gate, for each i > 0. Then {couni(Q)i} is a family of #AC° circuits computing 
|Q|. Next, we prove a technical lemma, that we will use to conclude the proof. 

Lemma 3.39 Let r be a Horn rule, Qn and Qd be two, polynomial sized, project-join expressions defined over 
DB and the predicates of r, and k be a rational value (such that Q < k < \, and k encoded as a pair of naturals 
(a,b), such that k — ^). Then there exists a constant- depth polynomial- size (w.r.t. the size o/DBj uniform 
family {C{r)i} of circuits of unbounded fan-in NOT and MAJORITY gates, such that C{r)i outputs 1, iff 
Wi > k, where IDBI = i. 

Proof. Consider the function /(DB,r) = 6|Q„| - a|Qd| taking value in N. Clearly, |^ > fc iff /(DB,r) > 0. 
We recall the following result of Q: for each integer N there exists a log-time uniform j^AC^ circuit, which, 
having in input the binary representation of N , computes N. Call this circuit number(N). Since a and b are 
integers, it is easy to build two f^AC^ circuits computing the functions &|Qn| and a|Qd|, connecting number (b) to 
count{Cln)i (resp. number{a) to couni(Qd)i) through a x-gate. Then, the function / is in the class GapAC^, 
and the language {r | -jS^ > k over DB}, where r is a Horn rule over DB, is in the class PAC^ — TC^ . 
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Therefore there exists a constant-depth polynomial-size uniform family {C{r)i} of circuits of imbounded fan-in 

IT 



MAJORITY and NOT gates, such that C{r)i outputs 1 iff jS^ > k, when the input database instance has size 



I. □ 

In order to conclude our proof, we build a TC'^ circuit family {C(r)i}, for / = cvr, I — cnf and / — sup, 
respectively, as follows: 

• For / = cvr and / = cnf, since both these indices have the form |^^, is given as in Lemma 3.39| ; 

• for / = sup, let r be in the form <(Xo) <— Zi(Xi), . . . ,Z„(X„). Consider the set of circuits families C = 
{{C^{r)i}, . . . ,{C"{r)i}}, where each family {C^{r)i} (1 < j < n), computes the function /j(DB,r) = 
a|Qj| — 6|Qj|, where Qj is the natural join ofli,..., projected over the attributes of Ij, and Q! is simply 
Ij. By Lemma 3.39, each C^{r)i £ C ( 1 < < n ) is TC" and outputs 1 iff the corresponding ratio is 
greater than k, when the input database instance has size i. Connecting the n (a constant value) circuits 
for each i, through an OR gate, we obtain the needed circuit family {C'{r)i}. 

Hence the result follows. □ 



Remark. Note that the proofs of Theorem 3.37 and Theorem 3.38 outline a general strategy for proving 
membership in AC'^/TC'^ for those indices either defined or reducible to a ratio between two project-join 
expressions defined (and polynomially sized) over the atoms of a given instantiated metaquery. 



4 Algorithms for metaquery answering 

In this section we address the problem of efhciently computing all the instantiations cr^^ of a fixed metaquery 
MQ on an input database DB, such that the values cn/(o'(MQ)), c?;r(cr(MQ)), and smp((t(MQ)) are greater 
than user-provided thresholds, k^^p kcvr, and kgup, resp. 
We decompose our answering problem into the following three subproblems: 

1. Find all the partial instantiations ab defined on 6odj/(MQ) such that the support obtained by applying 
CTb to MQ is greater than kgup hi DB. 

2. For each partial instantiation Cf, found in step 1, find all the partial instantiations defined on ft.eaci(MQ) 
such that (T = (Tft o Cf, is an instantiation defined on MQ, and cur(cr(MQ)) > kcvr- 

3. Return as a solution the instantiations cr found in step 2 such that cn/((T(MQ)) > k^^^^j-. 

The rationale behind this choice is as follows. First, we note that a high-support body is potentially shared 
by various rules, having different heads. Furthermore, by definition, the computation of the support requires 
to reduce, one after another, the relations in the rule's body, that is, to compute TTatt{ri){'f{body{r))), for each 
Ti € body{r); reducing a relation r; w.r.t. a set of relations 5* can be sometimes (i.e., when some acyclicity 
criterion is met) performed without explicitly computing J(S'), hereby gaining in efficiency. Similarly, the 
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Q(B,C) 



R(C,D) P(A,B) 



Figure 3: The join tree of Example 4.3 



computation of the cover (step 2) requires only the reduction of the relation associated to the head. Finally, the 
computation of the overall join (step 3) becomes less expensive if the involved relations are reduced (see p^). 

Next, we recall and/or extend some definitions related to computing conjunctive queries. For simplicity of 
notation, as we have also done above, in the following we refer to atoms and to associated relations interchange- 
ably, unless ambiguity arises. 



Definition 4.1 Let ri, r„ be a set of relations. We say that r^, i S {1, rt}, is reduced w.r.t. ri,...,r„, 
if ri = T^att{ri){'i'i txi . . . ixi r,i). We say that ri, . . . , r„ is reduced if is reduced w.r.t. ri, . . . , r„, for each 
1 = 1,... ,n. 



Definition 4.2 A join tree for a set of literal schemes Q is a tree T, whose vertices are the literal schemes of 
Q, such that whenever the same ordinary variable X occurs in two literal schemes Li and L2, then X occurs 
in each literal scheme on the unique path linking Li and L2 (see [Q). 

Example 4.3 Let Q be the set {P{A, B),Q{B,C), R{C, D)} of literal schemes. Then a join tree for Q is 
reported in Figure ^. 

It is then easy to generalize analogous results proved for conjunctive queries, to show that a metaquery is 
semi-acyclic iff its set of literals has a join tree [Q, p^ . 

Definition 4.4 Let Q = {ri, . . . ,r„} be a set of atoms on a database DB. A semijoin step is an expression 
of the form ritXr^, with I < i, j < n. A semijoin program is a sequence of semijoin steps. A semijoin 

program is called a full reducer for Q if, after executing that program, each is reduced, independently of the 
initial values of the relations (see 

A full reducer is a way to efficiently reduce a set of atoms. Unfortunately, not every set of atoms has a full 
reducer. Indeed, it is proved in |^ that a set of atoms has a full reducer iff it is semi-acyclic. 

For a semi-acyclic set of atoms Q, a full reducer consists of a sequence of two semijoin programs of the same 
length, called /jrsi-/iay and second-ZiaZ/ respectively. Let T be a rooted join tree for Q. The first-half is obtained 
by performing a bottom- up visit of T: let be the current node in the visit, then for each child rj of in T, 
add ri :— rilXr^ as next step of the sequence. The second-half is then obtained from the first-half, by reversing 
the sequence and exchanging the relations, i.e. from :— rifXrj we obtain rj :— rjt><ri. 
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Example 4.5 Consider the set Q = {p{A,B),p{B,C),r{C,D)}. Let T ^ {Q,{{q{B,C), p{A,B)), {q{B,C), 
r(C,D))}) be an associated join tree. Suppose T is rooted at q[B,C). A full reducer for Q is: 

q{B,C) 
q{B,C) 
piA,B) 
r{C,D) 

In general, to compute the value of support, we need to reduce a generic set of atoms, not necessarily a semi- 
acyclic one. To achieve this goal efficiently we exploit the concept of hypertree decomposition of a conjunctive 
query (see also |jl^ for a detailed discussion on the concept of "degree of cyclicity" of a conjunctive query). 

Definition 4.6 Let Q be a set of literal schemes. A hypertree for Q is a triple (T, x, A), where T is a rooted 
tree, and x and A are labeling functions which associate to each vertex p of T two sets x{p) ^ varoiQ) and 

Kp) ^ Q- 

Definition 4.7 Let T be a rooted tree. We denote by vertices{T) the set of nodes of T. Define x(T) as 
^pevertices{T)Xip) ■ P ^ vertices{T), Tp denotes the subtree of T rooted at p. A hypertree decomposition 

[ p7[ of a set of literal schemes Q is an hypertree for Q such that 

1. for each literal scheme L € Q, there exists p G vertices{T) such that varo{L) C x{p)] 

2. for each ordinary variable Y G varo{Q), the set {p G vertices{T) \ Y G x{p)} induces a (connected) 
subtree of T; 

3. for each vertex p G vertices{T), x{p) ^ ''^fl^o(A(p)); 

4. for each vertex p G vertices{T), varo{\{p)) H x(7^) C x{p)- 

An hypertree decomposition (T, x, A) of Q is complete if, for each L £ Q, there exists p G vertices{T) such that 
varo{L) C x{p) and L G A(p). 

We refer to ]l7[ for a complete discussion about the hypertree decomposition of conjunctive queries. 

Example 4.8 Let Q""^ be the set {P{A, B),Q{B,C), R{C, D), S{B, D)} of literal schemes. Let T"^ be the 
hypertree {{{pi,P2) , {P2,P3)}, x"" , ^'''''}) rooted at pi, where x'^'^iPi) = {AB}, x^''{P2) = {B,C}, x^^iPs) = 
{B,C,D}, X--{p,) = {PiA,B)}, \--{p2) = {Q{B,C)}, andX'^ips) = {R{C, D), S{B, D)}. Then {T-- , x'^ 
is a hypertree decomposition of Q'^^ . 

The width of an hypertree decomposition (T, x, A) is defined as T^'Ayip^yf.rtices(T) |A(p)|. The hypertree-width 
hw{Q) of a set of literal schemes Q is defined as the minimum width over all its hypertree decompositions. 

The notion of bounded hypertree-width generalizes the notion of semi-acyclicity for metaqueries. Indeed, a 
set of literal schemes Q is semi-acyclic iff hw{Q) = 1. 



q{B,C)\Xr{C,D) 
q{B,C)^p(A,B) 
p{A,B)>Cq{B,C) 
r{C,D)tXq{B,C) 



first-half 
second-half 
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lunction FiNDxtULESl^urs, ivii^, Kcnf 1 fZcvri fZsup-, J- ) • Set oi instantiations, 




var Yj : set of instantiations; 




T", 5 ; slwsl'y [l.-fi] of relation^ 




^JlLICcUUlt; r iiMJnUjAUibl (75 . dill llloiailtiaLlUll J , 




IlinCLlOll li/JNUULrHoUFHUHl . DOOlGclIl, 




1)6^111. 




lui tJdcii iiLcidi ociieiiie w t t/W(*(/\^iviVe^^ uu utigiii 




iCl) V oUCll llid,L vUil o\{lJ \-- X\Piy(i) ) cLllLl <-l t '^\Pi^(i))f 




let Va be the relation associated to cTbi^ci) in DB5 




11 \'ivaro(a)\'^[''\) ^ ' a\/ \' a\ t^sup tllcll leiUIIl tlUc, 




end; 




return false; 




enu, \ ENOUGHoUPPORT j- 








11 tjiNWULirlOUr^r^l-'rvl LlltJll Lltr^lll 




compute from s the relation b = J(o"b(6o(iy(lVIQ))); 




for each type T instantiation Ch for /i6(2cZ{^]VIQ) that agree with (Tfj do begin 




Slpf /) i"0 Hp i"ViP T*pl?ifioTi ?ic;c;opi?if prl i"0 /iP/Tf/7( A/rf~i 1 Hv /tu' 
kJCv It 111-' i-'c ijiic i. c-iOivivjii ajOOVy'^iaivCJ. iiw / (iOUii^y J-vxv^ / ^ flj 




— hrx-h- 




if l/j'l /IftI > and > / then E ■= E LJ irri. o ai.\- 




end 




end 




onrl • / TTTWr^l— TIj' a T~1G \ 








begin 




11 i \ /t Lllfz^ll I ilioL lidii 1 




lUl trd^ll Lypt; ± iiloLcililjicHjiUil Ui iUi ^vi^i/(t)y tiicLlj d^lcfc; Wltll C (} Lltr^lll 








lUl trdt^ll L-lillLl jJ-i Ui Pi/ii^ KiKJ 1 t . — / t U^'^ / t' 17 J 1 5 




if T fi! (y1 "i" Vifin PTivnRnnTTPc;/^ 9 -1-1 /tl n /t - V 
11 ' 1 tj y- v ijiidi r iiN uu wuiUfO ( 6 -l, u fj ^cj/. 




end 




else begin (* second half *) 




Q TJ * — n * 
b[li\ . — ( [n\, 




for j := Tl — 1 downto 1 do s\J\ := T\J\t>^s\y(^fcitheT(^u ^(j)))]; 




findHeads(ct6); 




end; 




end; { FINDBodies } 




begin 




compute c, the hypertree-width of 6ocit/(MQ); 




compute (T, X, A), a complete hypertree decomposition for body (MQ) of width c; 




compute a bottom- up visit of T = ({pi, ... ,Pn},E) and encode it as a permutation z/ of {1, . . 


■ ,n}; 


E :=0; 

findBodies(1, 0); 




return E; 




end { findRules }; 





Figure 4: Algorithm FindRules 



27 



Proposition 4.9 Let MQ be a metaquery and DB a database. Let (T, x, A) be a hypertree decomposition of 



widtli c for MQ, and let cr^^ be an instantiation. Define and X'{p) — a{X{p)) for each p e vertices{T). Then 
(T, X, A'} is an liypertree decomposition of width c. 

Example 4.10 The width of {T'^^,x'^^, X'^^) of Example is 2. As Q'^^ is not semi- acyclic, it follows that 2 
is also the hypertree-width of Q'^^ . 

Let Q be a set of atoms on a database DB, and let (T, x, A) be a hypertree decomposition of Q. It is known 
that we can build from Q, DB and (T, x, a set of atoms Q', a database DB', and a join tree T' for Q', such 
that Q' is semi-acyclic and i{Q) on DB is equal to 3{Q') on DB' Denote the above triplet (Q',DB',T') 
as acy{Q, DB, (T, x^X)). T' has the same tree shape as T. For each vertex p of T, there is precisely one vertex 
p' in T', and one relation r' in DB'. p' is an atom having x{p) arguments and r' is set to 7r^(p) (J(A(p))). Q' 
contains all the atoms corresponding to vertices of T'. 



Example 4.11 Consider the set of atoms Q*^^ of Example (.6, the associated hypertree decomposition (T*^^, 
X'''=,A^^>,, and the database BB"^ = {p,q,r,s}. Let p[ = p{A,B), p'^ = q{B,C), and p'^ ^ t{B,C,D) = 
r{C,D) M s{B,D), letT'^- be the tree {{p[,p'^,p'^}, {(j3[,p'^), {p'^,p',)}) , MQ'-- = {piA, B), q{B,C),tiB,C, D)} , 
and let BB""^ = {p,q,t}. Then acyiQ"'' ,T>B^'' , (T'^^^, X"'', A'^^}) = (Q'-^^, DB''^^, T'"^). 



Theorem 4.12 Let r be a Horn rule and let DB be a database, and let d be the size of the largest relation 
within DB. Then sup{r) can be computed in time d'^logd, where c is the hypertree-width of body{r). 

Proof. We can compute sup{r) as follows: 

1. set Q = body{v) ri(Xi), . . . ,r,„(X,„); 

2. compute the hypertree width c of Q and a complete hypertree decomposition (T, x- of Q having width 

c; 

3. compute acy{Q, DB, (T, x, A)) = (Q', DB', T'); 

4. compute the database DB', formed by the reduced set of relations Q' of DB', by executing a full reducer 
for Q' (note that Q' is semi-acyclic); 

5. compute from DB' the database DB containing the reduced set Q" — {r"(Xi), . . . , r^(Xm)} of relations 
in Q. (see for details); 

6. compute the sizes di, . . . , of the relations in Q and the sizes d'j^, . . . , of the relations in Q"; 

7. compute the value of sup{r) as maxi((i^/(ii). 
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Computing the hypertree-width c of Q and a c-width hypertree decomposition of Q requires a constant amount 
of time in the data complexity measure. It also follows that the size of (T, %, A) is a constant. Therefore in the 
third step the only operation depending on the input size is the computation of Tr^^p-j (J(A(p)), for each vertex 
p of T. Being c the hypertree-width of Q this can be done in time dP . An upper bound for the time complexity 
of steps 4, 5, and 6 is d'^logd^. Finally, also step 7 requires a constant amount of time. □ 

Definition 4.13 Let S* be a set of literal schemes. We denote by pv{S) the set of predicate variables occurring 
in S. Let o-j^g and /^db instantiations defined on two set of literal schemes and 5*2 over a database 

DB, respectively. We say that a and /i agree if 

1. a{S) = fi{S), where 5 = 5i n 5*2, and 

2. a'{V) = n'{V), where, 

• a' and fi' are the restrictions of a and fi on the sets pv{Si) and pv{S2) of predicate variables of 5*1 
and ^2, respectively, and 

• V ^pv{Si)r\pv{S2). 

Clearly, if a and /i agree, then cr o is an instantiation on the set 5*1 U 52- 

Figure § shows Algorithm findRules that, given in input a database DB, a metaquery MQ, three rational 
numbers < ksup, kcvr, kcnf < 1, and T G {0,1,2}, computes all the type-T instantiations <J^^ such that 
sup{a{M.Q)) > ksup, ci;r(cr(MQ)) > fc^r, and c7i/(ct(MQ)) > kcnf- 

Within the main procedure, the algorithm computes the hypertree decomposition (T, x, A) of 6oc?2/(MQ). By 



Proposition 4.9, this decomposition can be employed for each instantiation generated. Then, the procedure 
findBodies performs a bottom-up visit of T that generates the instantiations at of the body of the metaquery 



by composing the instantiations a-i of the visited literal schemes. Also it computes the databases DB' and DB' 



(see the proof of Theorem 4.12, steps (3) and (4)), and mantains in a data structure (an array r) the portions 



of the database DB' shared by the partial instantiations generated during the visit, reusing them. When the 
root of the tree T is attained, the first half of a full reducer for (Ji,{body{'M.Q)) has been calculated. Hence the 
procedure findBodies computes the second half of this full reducer (storing the reduced relations into an array 
s) and calls the procedure findHeads. A backtracking on previously generated local substitutions is finally 
performed. 

The procedure findHeads then verifies if crb(6o(iy(MQ)) has large enough support (this is done by calling 
the function enoughSupport) and, in the positive case, it computes the associated relation, and searches for 
heads such that the instantiated metaquery has large enough cover and confidence. 

As for the time-complexity analysis of algorithm findRules, in the data complexity setting, we note that 
during the bottom-up visit of T it performs a semijoin step for each edge of the visited tree, and that the 
number of edges visited by the algorithm is upper bounded by the number of instantiations of the body of the 
metaquery. 
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Then we can compute all the instantiations ctdb such that sup(cr(MQ)) > ksup in n™"^^'^ logd steps for 
T G {0,1}, and in {nb"')'^^^d'^' \ogd steps for T = 2, where n is the number of relations within DB, m the 
number of relation patterns of the metaquery MQ, a the maximum arity of any relation pattern in MQ, b 
the maximum arity of any relation in DB, and c is the hypertree-width of foody (MQ). Note that a, m and 
1 < c < m — 1 are constants. Moreover ma is an upper bound for the metaquery size, nbd is an upper bound 
for the number of attribute values in the database DB, and usually holds that d ^ nb. 

The additional steps needed to search for instantiations with high cover and confidence requires a total 
amount of (nd)" steps for T e {0, 1}, and (ub'^d)"' steps for T = 2. 

5 Conclusions 

In this paper we have formally defined the semantics of metaqueries and analyzed their computational complex- 
ity. In general, as far as the combined complexity is concerned, metaquerying is intractable (unless P=NP). 
Therefore, we have defined the class of acyclic metaqueries and shown that a subset of metaquerying problems 
defined by acyclic metaqueries is in LOGCFL and, as such, is highly parallelizable. Moreover, we have studied 
the data complexity of metaquerying problems and showed that, in general, it lies within TC*^. Even in this case, 
we were able to single out a subset of metaquerying problems, for which the data complexity is in AC^ . Finally, 
we have discussed metaquery implementation. The complexity analysis of metaquerying problems presented 
here, and which is summarized in Figure |^, is not complete, though. We are working towards establishing other 
results. In particular, it seems to be interesting to extend our formal framework to more general metaquery 
forms, such as allowing negation and/or disjunction to occur in metapatterns. 
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Figure 5: Summary of complexity results. 
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